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INFORMATION GEOMETRY FORMALISM FOR THE SPATIALLY 
HOMOGENEOUS BOLTZMANN EQUATION 

BERTRAND LODS AND GIOVANNI PISTONE 


Abstract. Information Geometry generalizes to infinite dimension by modeling the 
tangent space of the relevant manifold of probability densities with exponential Orlicz 
spaces. We review here several properties of the exponential manifold on a suitable set 8 
of mutually absolutely continuous densities. We study in particular the fine properties 
of the Kullback-Liebler divergence in this context. We also show that this setting is 
well-suited for the study of the spatially homogeneous Boltzmann equation if £ is a set 
of positive densities with finite relative entropy with respect to the Maxwell density. 
More precisely, we analyse the Boltzmann operator in the geometric setting from the 
point of its Maxwell’s weak form as a composition of elementary operations in the expo¬ 
nential manifold, namely tensor product, conditioning, marginalization and we prove in 
a geometric way the basic facts i.e., the H-theorem. We also illustrate the robustness of 
our method by discussing, besides the Kullback-Leibler divergence, also the property of 
Hyvarinen divergence. This requires to generalise our approach to Orlicz-Sobolev spaces 
to include derivatives. 
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1. Introduction 

Information geometry (IG) has been essentially developed by S.-I. Amari, see the mono¬ 
graph by Amari and Nagaoka |lj. In his work, all previous geometric—essentially metric— 
descriptions of probabilistic and statistics concepts are extended in the direction of affine 
differential geometry, including the fundamental treatment of connections. A correspond¬ 
ing concept for abstract manifold, called statistical manifold, has been worked out by S. 
L. Lauritzen in p], Amari’s framework is today considered a case of Hessian geometry 
as it is described in the monograph by H. Shima [38] , 

Other versions of IG have been studied to deal with a non-parametric settings such 
as the Boltzmann equation as it is described in [T3] and [H]. A very general set-up 
for information geometry is the following. Gonsider a one-dimensional family of positive 
densities with respect to a measure /x, 0 i—>■ pg, and a random variable U. A classical 
statistical computation, possibly due to Ronald Fisher, is 


^ J U{x)p{x]d) p{dx) = J U{x) Kdx) = 

(U-Eg[U])^logpg . 

The previous computation suggests the following geometric construction which is rig¬ 
orous if the sample space is finite and can be forced to work in general under suitable 
assumptions. We use the differential geometry language e.g., im. If A is the probability 
simplex on a given sample space (G, W), we dehne the statistical bundle of A to be 

TA = {(7r,M)|7r G A,m G £^(7r),E7r [m] = O} . 

Given a one dimensional curve in A, 6* h->■ tt^ we can define its velocity to be the curve 

^ ^ ^ ' 

where we define 


U{x) (^-^logp{x]e) 


p{x] 9) fi{dx) = Eg) 




T^e 


Each hber T^A = has a scalar product and we have a parallel transport 


U^: T^A 9 R ^ R - E^ [R] G T^A . 

This structure provides an interesting framework to interpret the Fisher computation 
cited above. The basic case of a finite state space has been extended by Amari and 
coworkers to the case of a parametric set of strictly positive probability densities on a 
generic sample space. Following a suggestion by A. P. Dawid in [181 IIB], a particular 
nonparametric version of that theory was developed in a series of papers [HSl [201 ESI ESI 
ESI [IS11221EB E21 El EO] , where the set P> of all strictly positive probability densities of 
a measure space is shown to be a Banach manifold (as it is defined in E1E112S]) modeled 
on an Orlicz Banach space, see e.g., [21 Ghapter II]. 

2 





In the present paper, Sec. [^recalls the theory and our notation about the model Orlicz 
spaces. This material is included for convenience only and this part should be skipped 
by any reader aware of any of the papers [HSl [201 ESI HSl 1121 HOI 123 Ell E21 EH ES] quoted 
above. The following Sec. is mostly based on the same references and it is intended to 
introduce that manifold structure and to give a hrst example of application to the study of 
Kullback-Liebler divergence. The special features of statistical manifolds that contain the 
Maxwell density are discussed in Sec. In this case we can dehne the Boltzmann-Gibbs 
entropy and study its gradient flow. The setting for the Boltzmann equation is discussed 
in Sec. where we show that the equation can be derived from probabilistic operations 
performed on the statistical manifold. Our application to the study of the Kullback- 
Liebler divergence is generalised in Sec. to the more delicate case of the Hyvari'nen 
divergence. This requires in particular a generalisation of the manifold structure to 
include differential operators and leads naturally to the introduction of Orlicz-Sobolev 
spaces. 

We are aware that there are other approaches to non-parametric information geometry 
that are not based on the notion of the exponential family and that we do not consider 
here. We mention in particular [29] and [?]• 

2. Model Spaces 

Given a cx-hnite measure space, (G, W, /i), we denote by 'P> the set of all densities that 
are positive /i-a.s, by the set of all densities, by V\ the set of measurable functions / 
with J f dfi = 1. 

We introduce here the Orlicz spaces we shall mainly investigate in the sequel. We 
refer to [T3| and [2S1 Ghapter II] for more details on the matter. We consider the Young 
function 


M 9 X H-)■ <h(x) = coshx — 1 

and, for any p G P>, the Orlicz space L^{p) = (p) is dehned as follows: a real 

random variable U belongs to L^{p) if 

Ep [<h(Q;17)] < -|-oo for some a > 0 . 

The Orlicz space L^{p) is a Banach space when endowed with the Luxemburg norm 
dehned as 


||t/||<,,p = inf{A>0|Ep[<h(t//A)]^l} . 
The conjugate function of <I> = cosh —1 is 


<h*(p) = (cosh —!)*(?/) = / arsinhu du = parsinhp -|- 1 — \/l -f y G 

Jo 

which satishes the so-called A 2 -condition as 


r\o^y\ r 

^*{ay)= / {\ay\ - u) ^''{u)du = 

Jo Jo 


' a \y\—u 2 \^ r \ 

— - 5 — du ^ max(l, a^)<I)*(|/) . 

1 + 


Since is a Young function, for any p G P>, one can dehne as above the associated 
Orlicz space L’^*{p) = (pj and its corresponding Luxemburg norm || ■ 11$,,p. 

Because the functions <h and <I>* form a Young pair, for each U G and V G 

L(c°®^-i)*(p) we can deduce from Young’s inequality 
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xy ^ $(x) + $*(?/) Vx ,y e 
that the following Holder’s inequality holds: 

K[uv]\ « l|r|l(o„h-i).,, < ~ ■ 

Moreover, it is a classical result that the space is separable and its dual 

space is the duality pairing being 

(H, U) e X ^ (H, l/)p := Ep [f/H] . 

We recall the following continuous embedding result that we shall use repeatedly in the 
paper: 

Theorem 1. Given p G 'P>, for any 1 < r < oo, the following embeddings 

L°^{p) ^ (p) -G U{p) ^ L^^osh-i). ^ 

are continuous. 

From this result, we deduce easily the following useful Lemma 
Lemma 2. Given p G P> and k ^ 1. For any ui ,... ,Uk & p^cosh-i 

k 

jjui G Pi u{p). 

2=1 l<‘r<oo 

Proof. According to Theorem]^ Ui G -L^(p) for any 1 < r < cx) and any i = 1,..., /c. The 
proof follows then simply from the repeated use of Holder inequality. □ 

From now on we also dehne, for any p G 'P> 

Hp = Lt{p) := {u G (p)lEp [u] = O} . 

In the same way, we set *Bp = Lq*{p) := [u G (p)|Ep [u] = O} . 

2.1. Cumulant Generating Functional. Let p G P> be given. With the above nota¬ 
tions one can dehne: 

Definition 3. The cumulant generating functional is the mapping 

Kp-. u e Bp I —y logEp [e“] G [0, -|-cx)] . 

The following result [la US] shows the properties of the exponential function as a 
superposition mapping |6]. 

Proposition 4. Let a ^ 1 and p G P> be given. 

(1) For any n = 0,1,... and u G L^{j)): 

K,n[u)\ [Wi, . . . ,Wn) - e- 

a a 

is a continuous, symmetric, n-multi-linear map from (L^(p)) to L“ (p). 

( V\'^ 

-] is a power series from L^(p) to L°‘{p), with radius of conver- 

CL / 

gence larger than 1. 

(3) The superposition mapping, v h->■ e’'/“, is an analytic function from the open unit 
ball of L^{p) to L^{p). 
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The cumulant generating functional enjoys the following properties (see [Ml Ha [13]): 
Proposition 5. 

(1) Kp{0) = 0; otherwise, for each u ^ 0, Kp{u) > 0. 

(2) Kp is convex and lower semi-continuous, and its proper domain 

doni(i^^p) = {u G Bp\Kp{u) < oo} 

is a convex set that contains the open unit ball of Bp. 

(3) Kp is infinitely Gateaux-differentiable in the interior of its proper domain. 

(4) Kp is bounded, infinitely Frechet-differentiable and analytic on the open unit ball 
of Bp. 

Remark 1. One sees from the above property 2 that the interior of the proper domain of 
Kp is a non-empty open convex set. From now on we shall adopt the notation 

Sp = Int (dom(iFp)) . 

Other properties of the functional Kp are described below, as they relate directly to 
the exponential manifold. 


3. Exponential manifold 

The set of positive densities, 'P>, locally around a given p G P>, is modeled by the 
subspace of centered random variables in the Orlicz space, Hence, it is crucial 

to discuss the isomorphism of the model spaces for different p’s in order to show the 
existence of an atlas dehning a Banach manifold. 

Definition 6 (Statistical exponential manifold [131 Def. 20]). For p G P>, the 

statistical exponential manifold at p is 

S ( p ) = e Sp ] . 

We also need the following dehnition of connection 

Definition 7 (Connected densities). Densities p,q & 'P> are connected by an open 
exponential arc if there exists an open exponential family containing both, i.e., if for a 
neighborhood I of [0,1] 


/ p^ dp = Ep 


= Eg 


Jn 

_\PJ 


_\qj 


< + 00 , 


In such a case, one simply writes p q. 


tel 


Theorem 8 (Portmanteau theorem [I3l Th 19 and 21], [371 Theorem 4.7]). Let 
p,q e V>. The following statements are equivalent: 

(1) p ^ q (i.e. p and q are connected by an open exponential arc); 

(2) qe£ (p); 

(3) S(p)= 8 (q); 

(4) logjGL^(p)nL^(g); 

(5) L^(p) = L^(q) (i.e. they both coincide as vector spaces and their norms are 
equivalent); 
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(6) There exists e > 0 such that 


- G L^^^{p) and - G L^^^{q) 

P Q 

We can now define the charts and atlas of the exponential manifold as follows: 

Definition 9 (Exponential manifold [361 135| IT ^ fl^)- For each p G P>, define the 
charts at p as: 


Sp-. qe£(p) 


log I j I - E, 


log(? 

\P 


^ ^ 1 


with inverse 


Sp ^ = Cp : M G HG e“ G (p) C P> 

The atlas, {spi 5p|p G P>} is affine and defines the exponential (statistical) manifold 

We collect here varions resnlts from [Ml ESI [H [S] abont additional properties of Kp. 

Proposition 10. Let q = G S (p) with u G Sp. 

(1) The first three derivatives of Kp on Sp are: 


dKp(u)[v]=¥.q[v] , 
d?Kp(u)[vi,V2\ = Cov, (^ 1 ,^ 2 ) , 


d^Kp(u)[vi,V2,vf\ = Co-Vq(vi,V2,vf) . 


(2) The random variable, - 

P 


1, belongs to *Bp and: 


dKp(u)[v] = Ep 


^-1 

P 


In other words, the gradient of Kp at u is identified with an element of the predual 
space of Bp, viz. *Bp, denoted by WKp(u) = — 1 = - — 1. 


(3) The weak derivative of the map, Sp 3 u WKp(u) G * Bp, at u applied to w E Bp 
is given by: 


dCVKp{u))[w] = - (w — Eq [tc]) 

and it is one-to-one at each point. 

(4) q/p G 

(5) Bq is defined by an orthogonality property: 


B, = L'S‘''-\q) 


u E L 


cosh — 1 


(p) 


E, 


<1 

u- 

P. 


= 0 


On the basis of the above resnlt, it appears natnral to dehne the following parallel 
transports: 


Definition 11. 
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(1) The exponential transport Bp —)■ Bg is computed as 

= M — Eg [m] U & Bp . 

(2) The mixture transport *Bg —)■ *Bp is computed as 

= -v ve*Bg . 

^ q 

One has the following properties 

Proposition 12. Let p,q ^ 'P> be given. Then 

(1) is an isomorphism of Bp onto Bg. 

(2) is an isomorphism of*Bq onto *Bp. 

(3) The mixture transport and the exponential transport are dual of each 
other: if u & Bg and v G *Bp then 

(4) and together transport the duality pairing: if u E Bp and v G *Bp, then 

We reproduce here a scheme of how the affine manifold works. The domains of the 
charts centered at p and q respectively are either disjoint or equal if p ^ q: 


£ {p) 

^(q) 


s„ 


B. 


p 

d{SqOSp^) 


f ^cosh — 1 




B„ 


'cosh — 1 


(P) 


(q) 


Our discussion of the tangent bundle of the exponential manifold is based on the 
concept of the velocity of a curve as in [T1 §3.3] and it is mainly intended to underline 
its statistical interpretation, which is obtained by identifying curves with one-parameter 
statistical models. For a statistical model pit), t E I, the random variable, pit)/pit) 
(which corresponds to the Fisher score), has zero expectation with respect to pit), and 
its meaning in the exponential manifold is velocity; see mi on exponential families. 
More precisely, let p (-): I ^ £ ip), I the open real interval containing zero. In the chart 
centered at p, the curve is m(-) : / —>■ Bp, where pit) = 


Definition 13 (Velocity held of a curve and tangent bundle). 

(1) Assume t h-)■ uit) = Spipit)) is differentiable with derivative 'u(t). Define: 


( 1 ) 

Pp{t) = = u{t) - - K,(u(t)) = flog i^F) = 3^ , 

Note that Dp does not depend on the chart Sp and that the derivative ofti-E- pit) in 
the last term of the equation is computed in L^*ip). The curve t h->■ (p(t), Zlp(t)) 
is the velocity held of the curve. 
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(2) On the set {{p,v)\p G V>,v G Bp}, the charts: 

(2) Sp-. {{q,w)\q eS{p),w e Bg] 3 {q,w) hG (sp(g)/I[J^w) e Sp x Bp C Bp x Bp 
define the tangent bnndle, T'P>. 

Remark 2. Let E: S {p) ^ M. he a fnnction. Then, Ep = E o Cp-. Sp ^ M. is differen¬ 
tiable and: 

j^E{p{t)) = j^Ep{u{t)) = dEp{u{t))u{t) = dEp{u{t))^iJl^^^Dp{t) 

3.1. Pretangent Bundle. Let M be a density and £ (M) its associated exponential 
manifold. Here M is generic, later it will be the Maxwell distribntion. All densities are 
assnmed to be in £ (M). 

Definition 14 (Pretangent bnndle *T£ (p)). The set 

*T£ {M) = {{q,V)\qe£{M),Ve*Bg} 
together with the charts: 


•Sp: -TSIM) 3 (q,V) ^ (sp(«),”U;y 

is the pretangent bnndle, *T£ (M). 

Let F be a vector field of the pretangent bundle, E ; £ (M) -3 *T£ (M). In the chart 
centered at p, the vector field is expressed by 


Ep{u) = ° ep(n) = o ep{u) G *Bp, u e Sm 

If Ep is of class with derivative dEp{u) G L{Bp,*Bp), for each differentiable curve 
t !-)■ p{t) = e^h)-Kp{u{t)) . p Dp{t) = j; logp(t) = U{t) — Ep(p \u (t)l and also 


f/.m)) = <ifp(C/(())l*(()| = dFp('t.(())l'UJ„|Bp(()| 6 -Bp 

Definition 15 (Covariant derivative in *T£ (M)). Let E be a vector field of class 
of the pretangent bundle *T£ (M), and let G be a continuous vector field in the tangent 
bundle T£ (M). The covariant derivative is the vector field EqE of*T£ (M) defined at 
each q E £ (M) by 


{DGE){q) = dEg{u)\^^^ [w], w = G{q) 

In the definition above the covariant derivative is computed in the mobile frame because 
its value at q is computed using the expression in the chart centered at q. In a fixed frame 
centered at p we write Sp{q) = w so that eg{u) = Cpiu — Ep [n] -|- w), and compare the two 
expressions of F as follows. 


„ ep(u - Ep [u] + w) = 

'K ° ep(« - Ep H + «,) = “UJFp(« - Ep !«.] + w). 

Derivation in the direction v E Bg gives 


eq{u) 

niTi 
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dFg{u)[v] = - Ep [u] + w)[v - Ep [v]], 

hence dFq{u) = '^WpdFp{'^WgU + w) and at n = 0, we have dFg{0) = ^WpdFp{w) 

It follows that the covariant derivative in the hxed frame at p is 

{DGF){q) = ^WpdF,{s,{q))^^G{q). 

The tangent and pretangent bnndle can be conpled to prodnce the vector bnndle of order 
2 dehned by 

{*T X T)S (M) = {(p, V , w)\peS{M),v e Bp, w e *Bp} 

with charts 


{q,v,w) ^ {Sp{q),^WgV,’^WgW) 

and the dnality conpling: 


Proposition 16 (Covariant derivative of the dnality conpling). Let F he a vector field 
of*T£ (M), and let G,X be vector fields ofTS (M), F,G of class G^ and X continuous. 

Dx {F, G) = {DxF, G) + {F, DxG) 

Proof. Consider the real fnnction £ 3 q ^ {F,G) (g) = Eg [F{q)G{q)] in the chart cen¬ 
tered at any p E £ (M): 


Sp3u^ Ee^(„) [F{ep{u))G{ep{u))] = 


Ep 


o epiuyVl^^^G o ep^ 


u) 


and compnte its derivative at 0 in the direction X{p). 


Ep [Fp{u)Gp{u)], 

□ 


We refer to [Ml 133] for fnrther details on the geometric structnre, namely the Hilbert 
bnndle, the tangent mapping of an homeomorphism, the Riemannian Hessian. We now 
tnrn to a basic example. 


3.2. Kullback-Leibler Divergence. The Knllback-Leibler divergence [23] on the ex¬ 
ponential manifold £ is the mapping 


D : (gi, g2) e T X £: i —)■ D (gi ||g2) = E, 
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Notice that, if g 2 = 




■ qi e £ (gi), u e S, 


91 ’ 


log — 
\<12 


then Kqfiu) 


= E91 [log(?i/? 2 )] is 


the expression in the chart centered at gi of the marginal Knllback-Leibler divergence 
g 2 e->• D (gi IIg 2 ). Therefore, the Knllback-Leibler divergence is non-negative valned and 
zero if and only if g 2 = gi becanse of Theorem]^ item (|^. Its expression in the chart 
centered at a generic p E £ is 


Dp ; {ui,U2) ESpX Sp I—)■ Kp{u2) - Kp{ui) - dKp{ui)[u2 - ufi , 
which is the Bregman divergence [ 0 ] of the convex fnnction Kp : iSp —)■ M. 

It follows from Proposition |^(|^ that it is G°° jointly in both variables and, moreover, 
analytic with 
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Dp{ui,U2) = ^ ^^(FKp{ui)[{ui - M 2 )®”], ||«1 - U2\\i^p < 1. 

n^2 

This regularity result is to be compared with what is available when the restriction, 
qi < 12 , is removed, he., the semi-continuity [HI §9.4], 

The partial derivative of Dp in the hrst variable, that is the derivative of ui h-)■ 
Dp{ui,U 2 ), in the direction v E Bp is 

d{ui H- Dp{ui,U 2 ))[v] = -(fKp{ui)[u 2 -ui,v] = - Covg^ ^log^,M^ , 

with Qi = ep{ui), i = l,2. If n = we have - Cov^^ (^log f^,v^ = CoVg^ (^log w), 

so that we can compute both the covariant derivative of the partial functional q h-)■ 
D(g||g2) and its gradient as 


D^{q ^D{q ||g2)) = CoVg ^log 

V(g ^ D (g 11 ^ 2 )) = log — - D (g ||g 2 ). 

g 2 

The negative gradient flow is 


^ log — = - log — + D (g(f) ||g 2 ) . 
dt q2 q2 

As ^ ^e* log = e*D (g(f) ||g 2 ), for each t the random variable 

g(^) f g2 


q 

log-D(g||g 2 ) ) w 

Q2 


e<log44_,ogh2)=e‘log 


g 2 


g2 


g2 v?(o) 


= e log q{t) 


02 * ^ 

g(o)-* 


is constant, so that q{t) oc g(0)® 
exponential time scale. 

The partial derivative of Dp in the second variable, that is the derivative of U 2 e- 
Dp{ui,U 2 ), in the direction v E Bp is 


It is the exponential arc of g(0) ^ g 2 in an 


d{u2 ^ Dp{ui,U2))v = dKp{u2)[v] - dKp{ui)[v] = Eg^ [n] - [n] , 

with qi = ep{ui), i = 1,2. If n = we have Eg^ [n] — Eg^ [n] = Eg^ [w] — Eg^ [tc], so 

that we can compute both the covariant derivative of the partial functional g 1 —>■ D (gi ||g) 
and its gradient as 


D^{q ^ D (gi ||g)) = Eg [w] - E^^ [w] 

V(q ^ D (q, 11,)) = 1 - ^ . 

g 

The negative gradient flow is 



g(^) _ A gi \ _ gi - g(^) 

q{t) ^ q{t)) q{t) ’ 

whose solution starting at go is q{t) = qi + (go — gi)e“h It is a mixture model in an 
exponential time scale. 
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4. Gaussian space 


In this Section the sample space is M."', M denotes the standard n-dimensional Gaussian 
density (we denoted it M because of James Clerk Maxwell (1831 - 1879)) 


(3) M(a;) = ^^^exp(^-^^, a; G 

and £ is the exponential manifold containing M. We recall that the Orlicz space (M) 

is defined with the Young function 4) := x h-)■ coshx — 1. The following propositions de¬ 
pend on the specific properties of the Gaussian density M. They do not hold in general. 

Proposition 17. 1. The Orlicz space (M) contains all polynomials with degree 

up to 2. 

2. The Orlicz space (M) contains all polynomials. 

Proof. 1. If / is a polynomial of degree d ^2 then 


.<^fO)M{x) dx = 




(27r)^/2 ^ 

and the latter is finite for all a such that a Hess / — / is negative dehnite. 

2. The result comes from the fact that all polynomials belong to L^(M) and one has 
L2(M) C (M). □ 

4.1. Boltzmann-Gibbs Entropy. While the Kullback-Leibler divergence D (gi ||g2) of 


Sec. 3.2 is defined and finite if the densities qi and q 2 belong to the same exponential 


manifold, the Boltzmann-Gibbs entropy (BG-entropy in the sequel) 


H{q) = -E, [log(g)] 

could be either non defined or infinite, precisely — oo, everywhere on some exponential 
manifolds, or finite everywhere on other exponential manifolds. 

Proposition 18. Assume p q, Then: 

(1) For each a ^ 1, logp G L“(p) if, and only if, \ogq G L^-^q). 

(2) logp G (p) if, and only if, logq G (g). 

Proof. If p, q belong to the same exponential manifold, we can write q = . p and, 

from Theorem [^(|4), we obtain logg — logp = u — Kp{u) G (p) = (g)^ so 

that logg G L°‘{q) if, and only if, logp G a ^ 1, and logg G (g) if, and only 

if, logp G (p). □ 

In order to obtain a smooth function, we study the BG-entropy H (g) on all manifolds 
£, such that for at least one, and, hence for all, p G T, it holds log (p) G (p). In 

such an exponential manifold we can write 


D (g Up) = -H{q) - Eg [logp] , 

so that H{q) ^ —Eg [logp]. 

For example, it is the case when the reference measure is finite and p is constant. An¬ 
other notable example is the Gaussian case, where the sample space is MP endowed with 
the Lebesgue measure and p(x) ocexp—■^. In such case J cosh(a|xp) exp (—|xp/2 ) dx < 
-|-oo if 0 < a < 1/2. 

We investigate here the main properties of the BG-entropy in this context. First, one 
has 
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Proposition 19. The BG-entropy is a smooth real function on the exponential manifold 
£. Namely, if p E £ then 

Hp : u E Sp ^ H o Cpiu) 

is a real function. Moreover, its derivative in the direction v equals 

dHp{u)[v] = - Covg (u + logp, v) , 

where q = Cpiu). 

Proof. As 

-\ogq = -u + Kp{u) - logp = -{u + logp + H{p)) + Kp{u) + H{p) E (p), 

with —{u + \ogp + H{p)) E Bp, the representation of the BG-entropy in the chart centered 
at p is 


Hpiu) = H o Cpiu) 

= [u + logp + H{p)\ + Kp{u) + H{p) 

= -dKp{u) [u + logp + H{p))] + Kp{u) + H{p) 

= {Kp{u) - duKp{u)) - dKp{u)\[ogp + H{p)] + H{p), 

hence, u h-)■ Hp{u) is a real function. Notice that Kp{u) — duKp{u) ^ 0, hence 
H{q) ^ Eg [logp], as we already know. The derivative of Hp in the direction v equals 

dHp{u)[v] = -d‘^Kp{u)[{u + \ogp + H{p)),v] + dyKp{u) = - CoVq{u + \ogp,v) . 

□ 

Notice that, for n = 0 and q = ep{u), we have Eg [n] = Ep [n] = 0, hence 

dHp{0) [n] = - Covp (logp, v) = -Ep [log (p) v] . 

Proposition 20. The gradient field VH over £ can he identified, at eachp, with random 
variable VH{p) E Bp G *Bq, 


VH{p) = -{\og{p) + H{p)). 

Proof. The covariant derivative DgH sX p E £ with respect to the vector held G dehned 
on £ with G(p) E Bp and p G T is 

DgH{p) = -Ep [log (p) G(p)] = - (log (p) + H{p), G(p))p . 

The gradient held VH over £, is then dehned by DgH{p) = (Vhf(p), G(p))p. This 
justihes the identihcation with the random variable Vi^(p) = —(log (p) -|- H{p)) E Bp C 

*Bq. □ 

Remark 3. The equation VH{p) = 0 implies logp = —P[{p), hence p has to be constant 
and this requires it is the hnite reference measure p. 

We refer to [M] for more details on the BG-entropy and in particular on the evolution 
of H on curve in £ of the type / 3 t ^ ft. 
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Figure 1. Elastic collision for two opposite cr’s corresponding to the two 
labeling of the velocities after collision. Velocities v and w are before colli¬ 
sion, V and w, after collision. The nnit vectors are k, = v — w, a = v — w, 
uj = v — v = K — a. The red dotted line represents the space generated by 
icn.Tlie angles are given by cos 0 = k • cn, 0 G [0, vr]. As a; ■ (cr -|- k) = 0, the 
angles are related by 0 = (vr — 0)/2. 


5. Boltzmann equation 

We consider a space-homogeneous Boltzmann operator as it is defined, for example. 


in |TT] and |1D]. We retell the basic story in order to introduce our notations and the 
IG background. Orlicz spaces as a setting for Boltzmann’s equation have been recently 
proposed by I2SI, while the use of exponential statistical manifolds has been suggested 
in [Ml Example 11] and sketched in [331 Sec 4.4]. We start with an improvement of the 
latter, a few repetitions being justihed by consistency between this presentation and uni 
1.3, 4.5-6], compare also Prop. 21 below. 


5.1. Collision kinematics. We review our notations, see our Fig. cf. [4U1 Fig. 1]. 
We denote by n, tc G the velocities before collision, while the velocities after collision 
are denoted hj v,w E The quadruple {v,w,v,w) G is assumed to satisfy the 

conservation laws 


(4) Fi (n, w,v,w) = V + w — {v + w) = 0, 

(5) F 2 {v,w,v,w) = \vf -f \wf — (|h|^ -f- |u;|^) = 0, 

which define an algebraic variety Ai that we expect to have dimension 12 — (3 -|- 1) = 8. 
The Jacobian matrix of the four defining Eq.s (|4]) and 0 is 


( 6 ) 




Vi 

V2 

V3 

Wi 

W2 

W3 

Vl 

V2 

V3 

Wi 

W2 

W3 

g 


1 

0 

0 

1 

0 

0 

-1 

0 

0 

-1 

0 

0 

(4 

>2 

0 

1 

0 

0 

1 

0 

0 

-1 

0 

0 

-1 

0 

(4 

>3 

0 

0 

1 

0 

0 

1 

0 

0 

-1 

0 

0 

-1 

(5 

) 

2vi 

2V2 

2v3 

2wi 

2W2 

2tC3 

—2vi 

-2V2 

-2v3 

—2wi 

-2W2 

-2w3 
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The Jacobian matrix in Eq. (|^ has in general position full rank, and rank 3 if f = ta = 
V = w. We denote by AJ* the 8 dimensional manifold AJ\{n = tc = h = w}. In the 
sequel, for n 7^ ta, we set 


—^— V — w 

V — w = - -r. 

|n — w\ 

From (|^ and (|^ it follows the conservation of both the scalar product, v-w = v-w and 
of the norm of the difference, — tc| = |h — m;|, so that all the vectors of the quadruple 
he on a circle with center z = {v + w)/2 = [v + w)/2 and are the four vertexes of a 
rectangle. U v = w then v = w, and also v = w = v = w as the circle collapse to one 
point, hence we have AJ* = AJ \ {n = w} = AJ \ {n = w}. 

There are various explicit and interesting parametrizations of AJ* available. 

An elementary parametrization consists of any algebraic solution of Eq.s (|^ and (|^ 
with respect to any of the free 8 coordinates. Other parametrizations are used in the 
literature, see classical references on the Boltzmann equation, e.g. |41j . 

A first parametrization is 


(7) {u,v,a) G (M^ X x 1— >■ (^,v, A(^{u,v)j G AJ* C 

where = {cr G M^l |ct| = 1} and the collision transformation Aa-: (v,w) h-)■ (v,w) = 
(v^,vjA is: 


( 8 ) 



V + w \v — w\ 
+ 


2 ' 2 
V + w In — tel 


a 


-a 


Viceversa, on AJ* the collision transformation depends on the unit vector a = v — w E 
while the other terms depend on the collision invariants, as |n — wf = 2(|n|^ + |w|^) — 
|n + wf. In conclusion, the transformation in Eq. Q is 1-to-l from (M^ x x to 
AJ*, where (M^ x = {(M,n) G x M^|n 7^ w}. 

A second parametrization of AJ* is obtained using the common space of two parallel 
sides of the velocity’s rectangle. Span {v — v) = Span (w — w), so that v — v = w — w = 
n(n — w), where 11 is the orthogonal projection on the subspace. Viceversa, given any 11 
in the set 11(1) of projections of rank 1, the mapping 


(9) 


An = 


(j-n) 

n 


n 

(j-n) 


r vu = V — n(u — w) = {I — n)u + iftc 
\ wu = w + n(n — w) = Iln + (/ — Iljra 


The components in the direction of the image of 11 are exchanged, linn = hltn and 
Ilinn = Iln, while the orthogonal components are conserved. If u is any of the two unit 
vectors such that 11 = a; 0 cn', the matrix An = does not depend on the direction 

of cj G Notice that Ajj = An and AnA'^ = !&, that is An is an orthogonal symmetric 
matrix. This parametrization uses the set 11(1) of projection matrices of rank 1, 


(10) (M^ X M^)* X n(l) 3 (M,n,n) I-)- {u,v,An{u,v)) G AJ* G (M^)^ . 

The (j-parametrization ([^ and the Il-parametrization (10) are related as follows. Given 
the unit vector n = v — w, the parameters 11 and a in Eq.s (|^ and (|^ are in 1-to-l relation 
as 
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(11) 


(T = (/ — 211)^, Il = K — a®K — a. 


The transition map from the parametrization Q to the parametrization 


(iU) 


(n, ta, a) h-)■ (v, w, (v 


w 


a) (v — w — a), 


while the inverse transition is 


(n, w, n) I— )■ (n, w, (/ — 2n)n — w). 


5.2. Uniform probabilities on and on 11(1). Let fi be the nniform probability on 
compnted, for example, in polar coordinates by 


r-27r 


(12) /i(dcr) = — 


sinip dip / /(sin^jcos^ Mi+sin99sin6* M2 + cos(^ M 3)(i6*, 


where ui,U 2 ,U 3 is any orthonormal basis of that is U = [uiU 2 U 3 \ G SO(3). In snch a 
way, U: H-)■ and the right hand side of Eq. (12) does not depend on U. 

As the mapping )-n = a;®a;G n(l) is a 2-covering, we define the image u of 

fi by the eqnation 


(13) / (7(11) v{dll) = / g(u ® u) ii{du) = 2 / g{u®ijj) ii{du), 

in(i) is2 i{(TeS2|K.o->0} 


where k G is any nnit vector nsed to split in two parts, {(t\k ■ a > 0} and {(t\k ■ ct < 0}. 


Eq. (13) dehnes a probability z/ on n(l) snch that we have the invariance 


/ g{umj') z/(dn) = / ^(n) dz/(n), u g so(3). 

2n(i) 2n{i) 

Let ns compnte the image T*/i of the nniform measnre /i nnder the action of the 
transformation : a h-)■ a; = k — a. If in Eq. (12) we take M3 = m, that is an orthonormal 


basis (mi,M 2 ,^), then, for (j),6 snch that a = sin0cos6* Ui -l-sin^sin^ U 2 -|-cos0 k and 
0 = (vr — 0)/2 (see Fig. ^ we have 


uj = K — a = sin 0 cos 9 ui + sin 0 sin 9 U 2 + cos 0 k 


= sm 


TT — 


cos 9 Ui+ sin 


TT — 


sin 9 U 2 + cos 


TT — 




(14) 


= cos ( — I cos 9 ui + cos ( — I sin 9 U 2 + sin ( — I k 


and for all integrable /: —>■ M one has 
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/(^) T*{du) = / / 


/S2 

1 


r-27r 


= — sin (j) dcj) j d9 f \ cos — cos 9 ui + cos — sin 0 M2 + sin — 
47rJo Jo \ V2/ V2/ V2 

/‘7rf2 n27T 

= — 2sin(20) (i0 / (i6* / (cos 0 cos 6* mi + cos 0 sin 6* M 2 + sin 0 k) 

Jo Jo 

P7rf2 p2tv 

= — 4cos0sin(0) (J0 / / (cos0cos6'Ml + cos0sin6'M2 + sin0 

Jo Jo 

(15) =[ f{uj) A{k-U j) iJ.{duj), 

compare |10l 4.5]. 

In particular, for a symmetric function, f{oj) = /(—a;), we have 


(16) / f{u)T*{du)= f{K-a) fi{da) = f{u) 2 \k ■ u\ fi{du). 

J§2 JS2 JS2 

It follows, for each integrable g: 11(1) —)■ M, that 


(17) / 5 f((M — cr) 0 (m — cr))/i((i(T) = / 5 f(a; 0 cm) 2 |m • wl //(cfo;) 

Js2 Js2 


Notice that if we integrate Eq. (15) with respect to k we obtain 


(18) 

because 


f{K — a) iJi{da)jjL^dK) = / /(a) jj.{da), 
S2x§2 J§2 


(19) 




K ■ a g,{dK) = 


5.3. Conditioning on the collision invariants. Given a function g: MJ x M^, Eq. 

shows that the function 


(20) g:{u,v)^ / g{v^,w^) g,{da) = / g [A^{v,w)) g,{da) 

Js2 Js2 ^ ' 

depends on the collision invariants only. This, in turn, implies that g is the conditional 
expectation of g with respect of the collision invariants under any probability distribution 
on X such that the collision invariants and the unit vector of a are independent, 
the unit vector a being uniformly distributed. See below a more precise statement in the 
case of the Gaussian distribution. 

On the sample space (M^, dv), let M be the standard normal density dehned in (|^ (the 
Maxwell density). As, for all 11 G 11(1), AnA^ = AnAn = h, in particular |det An| = 1, 
we have for each (V, W) M ® M (i.e. V, W are i.i.d. with distribution M) that 

An(E,lE) = (K;, 1 E.) ~ (G, 1 E). 
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(21) 



Under the same distributions, the random variables ^ ~ indepen¬ 

dent, with distributions given by 

V + W lU-lUp o ■-■ 

(22) ^_^n( 03 ,/ 3 ), J-^~X^(3), U-lU~/r, 

respectively. Hence, given any S' ~ /r such thatS, are independent, we get 

(23) i5(U,lU)~(U,lU) . 

This equality of distribution generalizes the equality of random variables lU) = 

(U, W). We state the results obtained above as follows. 

The image distribution oi M ® M induced on (M^)"^ by the parametrization in 
Eq. Q is supported by the manifold Such a distribution has the property that the 
projections on both the first two and the last two components are M ® M. The joint 
distribution is not Gaussian; in fact the support is not a linear subspace. We will 
call this distribution the normal collision distribution. 

The second parametrization in Eq. ([^ shows that the variety M. contains the bundle 
of linear spaces 

(n, w) H->■ (n, w, (/ — n)r; -|- Hta, Hn -|- (/ — n)tc), tt G n(l) . 

The distribution of 


(24) M . 3 {v,w,v,w) ^ U = V — V ® V — V e n(l) 

under the normal collision distribution is obtained from Eq. (11). In fact H is the 
projector on the subspace generated hy k — a where {v,w,a) ^ k = v — w is uniformly 
distributed and independent from a. Hence, Eq. (18) shows that {v,w,a) ^ k —a 
is uniformly distributed on so that the distribution in Eq. (24) is the u measure 
defined in Eq. (13). Conditionally to H, the normal collision distribution is Gaussian 
with covariance 


I 0 J-H H 

0 J H J-H 

J-H H J 0 ■ 

H J-H 0 I 

We can give the previous remarks a more probabilistic form as follows. 

Proposition 21 (Conditioning). Let M he the density of the standard normal 
and g: X ^ M. be an integrable function. It holds the following 

(1) If yW)r^M®M and H G n(l), then 


E {gy W)\V + W,\Vf + \Wf) =E{g (Hn(U, W))\V + W, |U|' + \Wf) 
(2) If y W)^M®M, then 



g (i<,(U, W)) yda) = E {gy W)|U + W, |U|' + \Wf) . 
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(3) If {V, W)^M®M, then 


/S2 


g{An{V,W)) 


TliV -W 


p{dll) =¥.{g{y,W)\V + W,\V\^ + \W\^) . 


(4) Assume (V, W) F, F e S (M 0 M), with F{v, w) = f{v, w)M{v)M{w). Then 


and 


f o A„ fi{da] 


■ M®M e£{M®M) 


1 


[g{y,W)\V + w,\vf + \w\^) = 

/g, g [a^{V, W]) f (^A^{V, W)) fx{da) 


U9(.A„(V,W))f(A„(V,W)) 

n 

iy{dU) 

f (An(V,W)) 

n (yATw^ 

u{dlV) 


Proof (1) WenseV + W = Vn + WnAVf + \W\^ = \Vn\^ + \Wn\\{V,W)^{Vn,Wn). 
For all bounded /ii: —)■ M and /i 2 : M —)■ M, we have 


E (E [g {Au{V, 1F))|1^ + W, |l^|' + \Wf) hAV + lF)h 2 (|l^|" + \Wf)) = 

E {g{Vu, Wu)hAV + lF)h2(|l"|' + \Wf)) = 

E {g{V^, W^)hAVn + Wu)h2{Kf + \WA^)) = 

E {g{V, W)hi{V + lF)h 2 (|l^|' + \W\^)) = 

E (E {g{V, 1F)|1^ + IF, |F|' + \Wf) h^V + W)h 2 {\Vf + \Wf)) . 
(2) For a generic integrable h: x —)■ M we have 


E(h(F,lF)) = 

E(h 


y + w ^\V^^^^X^_\V^V-W]] = 


E h 


/S2 


2 2 ’2 2 
V + W |F-1F| V + W |F-1F| 


+ 


-T 


a j I iJi{da) = 


E 


/S2 


jj,{da) , 


because F — IF ~ /x, and (F + IF), |F — 1F|, F — IF are independent. 
The random variable 



g o A^{V, W) fi{da) 



/S2 


F + IF |F-1F| F + IF |F-1F| . ,, , 

+ o F - o ^ Pid^) 
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is a function of the collision invariants i.e., it is of the form g{V + W, \Vf + |hh|^). 
For all bounded hi: —?• M and ^ 2 : M —>■ M, we apply the previous computation 
to h = ghih 2 to get 


E 


g o A^(V, W) fiida) ] hiV + W)h 2 {\Vf + |hF|') ) = 


E (^7 o A^{V, W)h{V + W)h 2 i\Vf + |fF|')) fi{da) = 


E (^^7 o W)h{V^ + hF.)h 2 (|K| + \WA )) Kdcr) = 

E {g{V, W)h{V + W)h 2 {\Vf + \Wf)) . 

(3) We use Item and the equality Aa-{v,w) = Ayi{v^w) when 11 = k — a ® k — a 


and K = V — w to write 


{A^._„g^_JV,W)) ^(d<T) = E (a{V,W)\V + w, VI" + |M.f) 

where, for given vectors u, n G u, n 7 ^ 0 we simply denote u®v = u®v. From 
Eq. ([I7|, the left-end-side can be rewritten as an integral with respect to a; G 


g{A^^UV,W))2 


V-W-cj 


'S 2 


fi(du)= I g{A^_^^^_AV,W)) fi(da) 


If n = a; ® cu, then \k ■ uj\ = \u®ujk\ = |nK|. Using that together with the 
dehnition of the measure v on 11(1) in Eq. (13), we have the result: 


/S 2 


g{A^^^{V,W))2 


V-W-u 


ju(duj) = 


/§2 


g(A^^Uy,W))2 (uj®u)V-W fi{du) = 


'n(i) 


g{Au{V,W)) n(U-IU) u{dU). 


(4) We use Th. g If F G ^ (M 0 M), then 


F = A^-Km^m(U)^ JJ ^ ^ 

and there exists a neighborhood I of [0,1], where the one dimensional exponential 
family 


Ft = QtU-KoitM)^ ^ ^ 

exists. The random variable 


(25) f= foA^ g.{da) 

J§2 

is a positive probability density with respect to M®M because it is the conditional 
expectation in M (g) M of the positive density /. 
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In order to show that Em^m (/)* < +oo for t G /, it is enough to consider 


the convex cases, t < 0 and t > 1, because otherwise Em^m (/) 


/ foA^ n{da) = / eUoA,-KM^M(u) ^ 

J§2 i§2 

SO that in the convex cases: 


^ 1. We have 


E 


■M®M 


/ f ^ \ * 


/ r ~ \ * 

( / f R{da) 

— EM(giM 

( / \ 

Vi§2 ) 


V/S2 J 


< 


E 




^tUoAc—tKM®M{U) 


fi{da) 


= Em®m = 


<; -|-CX), t G / \ [0, 1]. 


^KM®M{tU) — tKMfgiM{U) 

To conclude, use Bayes’ formula for conditional expectation, 

(A\B) - (;|B) ■ 

to the expressions of conditional expectation in Item and Item above. 


□ 


Remark 4. In the last Item, we compute a conditional expectation of a density /, that is 


E (/(Id, hh)|ld + IT, |I/|' + |IT|") = /(Id + IT, |T|' + |IT|'). 

The random variable /: x M+ —)■ M is the density of the image of F dvdw with respect 

to the image of M{v)M{w) dvdw. 

5.4. Interactions. We introduce here the crucial role played by microscopic interaction 
in the dehnition of the Boltzmann collision operator. In the physics literature, such 
interaction are referred to as the kinetic collision kernel and takes into account the in- 
termolecular forces suffered by particles during a collision [TT]. Before dehning formally 
what we mean by interaction, we hrst observe that, if M is the Maxwell density on 
and f,g-^M then 

f ® g M ® M 

where M ®M is the standard normal density on and f ®g is a. density on M®. Indeed, 
one has 


f{v) = U eSM , 

g{w) = V gSm . 

It follows that the product density has the form 


f{v)g{w) = f/ © T G 5 m ® 5m C 5m®m , 

which implies f ® g M ® M, with Km®m{U © T) = Km{U) + Km{V). 
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Definition 22. With the previous notations, we say that 6 : x —)■ is an inter¬ 

action on 8 (M) 0 £ (M), ifEft^g [b] < -foo, so that 


[^] 


f ® g: {v,w) X 


I-)- 


b{v, w) 

IE/®g [b] 




is a density. 

Sometimes, we make the abuse of notation by writing [■], where the obvious 

normalization is not written down. As can been seen, here interactions indicate only a 
class of suitable weight functions b for which b{v,w)f{v)g{w) is still (up to normalisation) 
a density. This is in accordance with the usual role played by the kinetic collision kernel 
(see IS]). 


According to the Portmanteau Theorem 


if 


■ f ®g f 


it holds 


E/®9[fe] 

for some e > 0 . 


_ E/0g [fe] 

g, which, in turn, is equivalent to [ 6 ^+' 


f ®g M ®M li and only 


,E 


b-f®g 


[h 


-l-el 


< OO 


Proposition 23. Let 6 : x —)• M’*' he such that for some real A G M and positive 

B,C, X G ]R>, it holds 


C \u — v\^ ^ b{u,v) ^ A + B \u — vf . 

Then the b is an interaction on £ (M) x £ (M) and for all f^g-^M the following holds. 

( 1 ) 

(2) Assume moreover that the interaction b is a function of the invariants only 

b{v, w) = b{v - 1 - w, |n|^ -f \wf) . 

It follows 

Em®m {b-f®g\V + W, |1/|' + \Wf) =b{V,W) [ f®go A^(V, hP) p(dcr) . 

and moreover a sufficiency relation holds i.e., for all integrable F: x —)■ M, 

it holds 

{F{V, 1T)|1/ + IT, |1/|' + IITI') = E/^g {F{V, 1T)|T + W, |T|' + |fT|') . 

Proof. 1. We can assume Ef^g [b] = 1 . As b(u,v) ^ A + B\u — v\^ ^ A -|- 2 i? \ {u,v)\‘^, 
we have b G L^{M 0 M) = L^{f ® g), and hence b G 0 g) for all e > 0. For 

the second inequality we use the Hardy-Littlewood-Sobolev inequality [25l Th. 4.3]. We 
have 


Efe./(g,g \b ] — 


fiu)g{v) 
b{u, vY 


dudv < C 


-1 


f{u)g{v) 


\u — v\ 


I eA 


dudv. 


If I -|- y -|- ^ = 2, by the H-L-S inequality the last integral is bounded by a constant times 
From f,g-^ 

of 1. There exists e = | 


M we get that ||/||„ ||(y'||^ is finite for a, /3 in a right neighborhood 
2 — 1 — 1^ >0 satisfying all conditions. 


2. It is a special case of the Conditioning Theorem 21 


□ 


Let us discuss the differentiability of the operations we have just introduced. 

Proposition 24. 
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(1) The product mapping £ (M) 3 f f ® f is a differentiable map into £ (M ® M) 
with tangent mapping given for any vector field X &T£ (M) by 


Tf£ (M) 3 Xf^ Xf®Xf e Tf^f£ (M ® M) . 
(2) Let b be an interaction on £ (M) 0 £ (M). If the mapping 


Proof. 


f^ 


E/®/ [b] 


■f®f 


is defined on £ (M) with values in £ (M 0 M), then it is differentiable with tangent 
mapping given for all vector field X G T£ (M) by 

^ ® ~ ^b-f^f [Xf © Xf] . 

(1) We have already proved that f M implies f ® f M ® M and that 
the mapping in the charts centered at M and M ® M respectively is represented 
as Sm ^ U ^ U + U G Sm^m- The differential of the linear map is again 
Tm£ (M) 3 V ^ V (B V E Tm®m£ {M ® M). The transport commutes with the 
© operation, 


and the result follows. 

(2) Let U be the coordinate of / at M. By assumption, we have 


— I ttt/ 


V’mV 


and 




E/®/ [b] 


■/©/ 


■/©/ 


E 


'M®M 


[b] 


■ M 




[b] 


■ M 


Em®m [&] 

[b] 


exp{U®U-2 Km{U)) , 


so that the coordinate of 




• / © / in the chart centered at 


^M0M 


[b] 


■ M © M is 


UOU- E,.m®m [17 © 17] = K%TiU © 17). 

The expression if linear and so is the expression of the tangent map 
V t-3 V © 17 — [17 © 17] . 

In conclusion, for each vector field X of TT (M), at / we have 17 = V^Xf and 

17 © 17 = (B Xf), hence the action on Tf£ (M) is as stated. 

□ 

Assume f M and let b be an interaction on / © / which depends on the invariants 
only and such that b ■ f ® f M ® M. For each random variable g G = 

^cosh-i^j^^ define g = \ g®g, which belongs to = L™®’^“^(/©/). Define 

the operator 


A: L 


cosh — 1 / 


(M) ^ ® M) , 


by 


Ag{v,w)= / g (A^{v,w)) nida) - g{V,W) . 
7s2 ^ 
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As constant random variables are in the kernel of the operator A, we assnme E/ [(?] = 0. 


5.5. Maxwell-Boltzmann and Boltzmann operator. As explained by C. Villani mi 

1.2.3], Maxwell obtained a weak form of Boltzmann operator before Boltzmann himself. 
We rephrase in geometric-probabilistic langnage snch a Maxwell’s weak form by expand¬ 
ing and rigoronsly proving what was hinted to in |33] . 

Let b: X —)■ M+ be an interaction on £ (M) x £ (M) which depends on the 
invariants only and snch that b ■ f ® f M ® M ii f M, cf. Prop. 23 We shall call 
snch a 6 a proper interaction. 

For each random variable g G dehne g = ^ g (B g hj 


g{v,w) = ^ {g{v)+g{w)) 

which it is easily shown to belong to 0 M) = /.^osh-i^y ^ mapping 

g ^ g is a. version of the conditional expectation Rm^m (s'I‘ 5), where S is the cx-algebra 
generated by symmetric random variables. 

We dehne the operator 


by 


A: ^ ® M) , 


Ag(v,w) 


which is a version of 



Aa(v,w) 


f2(da)-mm, 


Ag = Rm®m {Rm®m (5'|‘5)|X) — Rm®m {g\<S) = ( 5 '|X) — Rm^m {g\S) , 

where X C 5 is the a-algebra generated by the collision invariants {v, w) i—)■ {v + w, |n|^ -1- 

I |2\ 

|tc| j. 

As constant random variables are in the kernel of the operator A, we assnme Rf[g] = 0. 
Analogonsly, as the kernel of the operator contains all symmetric random variables, we 
conld always assnme that g is anti-symmetric. 

The nonlinear operator / i—)■ Rb-f®f [Ag] is fhe Maxwell’s weak form of the Boltzmann 
operator, g being a test fnnction. 

Proposition 25. Given a proper interaction b and a density / G X (M), the linear map 


l^g] 

is continuous. 

(1) It can he represented in the duality x by 


[Ag] — Rm®m 


b ( Rm^M ( ^ ® “tt 


M M 




where Q is the Boltzmann operator with interaction b, 


<5(/)(^) = /a (•^ ® w)^ - f(v)f(w)^ b(v, w) fi{da) dw 
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(2) Especially, if f = yj^ g ^ jQg then 


Efc./(gi/ 



{Q{f)/f,u)j . 


Proof. The continuity follows from the Portmanteau theorem and the continuity of the 
conditional expectation. Item follows from the projection properties of the conditional 
expectation and from general properties of Orlicz spaces. Item is a special case of the 
previous one. □ 


It follows from the previous theorem and from the discussion in |33l Prop. 10] that 
/ Qif)/f is a vector held in the cotangent bundle *T£ (M) and its how 


Df ft Qift) 

ft ft 

is equivalent to the standard Boltzmann equation ft = Qit). We do not discuss in this 
paper the implications of that presentation of the Boltzmann equation to the existence 
properties. We turn our attention to the comparison of the Boltzmann held to the gradient 
held of the entropy. 


5.6. Entropy generation. If 11—)■ p{f) is a curve in £ {M), the entropy of p{t) is dehned 
for all t and the variation of the entropy along the curve is computed as 


d 

dt 


H{pt) 


j^{p{v]f)\ogp{y]t)) dv = 


d 


d 


— / (logp(n;t) — 1) —p{v]t) dv = — \ogp{v;t)—p{v,t) dv 


dt 


dt 


In our setup, the computation takes the following form. Let t pt he a diherentiable 
curve in £ (M) with velocity t i—)■ Dpt G As the gradient ViL is given at p by 

{'VH){p) = —(logp + H{p)) G Bp, we have 


^H{p{t)) = - (logp(t) + H{p{t)), Dp{f))p^^) . 

In particular, if Dp{t) = p{v;t)/p{v;t) we recover the previous computation: 


j^H{p{t)) = - j (logp(f) + H{p{t))) p{v, t)dv 

/ d 

\ogp{v,t)—p{v,t) dv . 

Assume now that {p,t) h-)- 7(t;p) is the how of a vector held F : £ (M) —)■ *T£ (M). Then 


—Hi-fit;p)) = - {\og'yit;p) + Hi-fit;p)),D-fit;p))^^^.^ 

= - (iog7(^;p) + ^(7 (^;p)),^(7(^;p)))^7;p) , 


that is, for each p E £ (M), the entropy production at p along the vector field F is 
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{VfH){p) = - (logp + H{p), F{p))^ = 

— j {logp{v) + H{p)) F{v,p) p{x)dx = — J logp{v)F{v,p) p{v)dv . 

In particular, if the vector field F is the Boltzmann vector held, F{f) = Q{f)/f, we have 
that for each f E S (M) the Boltzmann’s entropy production is 

^(/) = (VQ(/)//i/)(/) = -{\ogf + H{f),Q{f)/f)^ 

= -Efo./®/ [dl log /] = [A (log / 0 log /)] , 

where 


(log / 0 log /) (n, w) = log f{v) - log f{w) = log 


/(^) 

fiw) 


We recover a well-known formula for the entropy production of the Boltzmann operator 
We now proceed to compute the covariant derivative of the entropy production. 


Proposition 26. 

Let X be a vector field ofTS (M) and let F be a vector field of*T£ (M). 
(1) The Hessian of the entropy (in the exponential connection) is 


( 2 ) 


DxVHif) = -X. 


The covariant derivative of the entropy production along F is 


D 


X 


= -{X,F) + {VH,DxF). 


Proof. We note that the entropy production along a vector held F, ^ = (VfH){p) = 
{VH, F), is a function of the duality coupling ofTS (M) x*T£ (M), so that we can apply 


Prop. 16 to compute its covariant derivative along X as 


DxS>= {DxVH, F) + (VF, DxF) 


The hrst term at p is 


{DxVH,F) (p) = {DxVH{p),F{p))^. 

Let us compute DxVH{p), which is the Hessian of the entropy in the exponential con¬ 
nection. First, we compute the expression of Vif(g) = — (logg -|- H{q)) G Bg, q E £ (M) 
in the chart centered at p. We have 

- logg = KpiU) - U - logp, U ESp,q = ep{U), 

and 

Hp{U) = -E, [logg] = Kp{U) - dKp{U)[U] + dKp{U)[VH{p)] + H{p) 

so that 

(VF) o ep{U) = -U + dKp{U)[U] + VH{p) - dKp{U)[VH{p)] 

and, hnally, the expression of Vif in the chart centered at p is 
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= ‘Vlfu)(^H) o e,{U) = Vi/(p) - U. 

Note that this function is affine, and its derivative in the direction X(p) is d{'VH)p{U) [X (p)] 
-X{p). It follows that DxVH = -X. □ 

The application of this computation to the Boltzmann held i.e. F{f) = Q{f)/f re¬ 
quires the existence of the covariant derivative of the Boltzmann operator. We leave this 
discussion as a research plan. 


6. Weighted Orlicz-Sobolev model space 


We show in this section that the Information Geometry formalism described in Sections 
and 1^ is robust enough to allow to take into account differential operators (e.g., the 
classical Laplacian). This yields naturally to the introduction of weighted Orlicz-Sobolev 
spaces. While the case “without derivative” studied in Section was well-suited for the 
study of the hne properties of the Kullback-Leibler divergence, we illustrate in Sec. |6.3 
our use of Orlicz-Sobolev spaces with the hne study of the Hyvarinen divergence. This 
is a special type of divergence between densities that involves an L^-distance between 
gradients of densities |2I] which has multiple applications. In particular, it is related with 
the so called Fisher information as it is dehned for example in [IDl p. 49], which has deep 
connections with Boltzmann equation, see |39]. However the name Fisher information 
should not be used in a statistics context where it rather refers to the expression in 
coordinates of the metric of statistical models considered as pseudo-Riemannian manifolds 
e.g., [1]. 

We introduce the Orlicz-Sobolev spaces with weight M, Maxwell density on 


(26) = {/ 6 (M)\S,f e V”""' (M),] = I.- ■ ■.«} . 

(27) iyo„.h-„.(M) = {/ e (M)|S,/ e (M) = 

where dj is the derivative in the sense of distributions. They are both Banach spaces, see 
[251 §10]. (The classical Adams’s treatise [21 Ch 8] has Orlicz-Sobolev spaces, but does 
not consider the case of a weight. The product functions {u,x) h-)■ (cosh —l)(M)M(a:) and 
(m,x) HA- (cosh — 1)*(m)M(x) are 0-functions according the Musielak’s dehnition.) The 
norm on is 


(28) WfWwU- .j(M) “ ll/llLcosh-i(jvf) + EIISTII 

and similarly for hF(cosh-i)*("^)- begins with a hrst technical result in order to relate 
such spaces with statistical exponential families: 

Proposition 27. Given u G Sm O and f G one has 

Proof. For simplicity, set G = knows from the Portmanteau Theorem 

that GM e £ {M) and therefore, there exists £ > 0 such that G G L^+^(M). Let 
us prove that fG G _ First of all, since L^+^(M) C 

/ G (M) one has fG G Lf{M). Moreover, for any x G M", according to classical 

Young’s inequality 
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f{x)G{x)i,-\f{x)\’‘+-\G{x)\'> Vp>l, i + i = l, 

p q p q 

Since is increasing and convex 

$,(/(a;)G(a;)) ^ (G(a;)'?) x G M". 

p q 

Now, since / G L^{M), one has |/|^ G fQj. p > 1, i.e. $*(|/|^) G L^{M) 

for all p > 1. Choosing then 1 < q < 1 + e one has G (M) C /,(™sh-i)* 
that G L^{M). This proves that ^^{fG) G L^{M) i.e. 

fG G 

In the same way, since / G hTcosh-i(^) ^as 

Gdjf G (M) Vj = 1,..., n. 

Moreover, u G hhcosh-i(^) that, for any j = 1,... ,n, Gdju G Ui^M) for any r > 1 
and therefore ^^{\Gdjuf ) G L^{M) for any p > 1. Repeating the above argnment we get 
therefore 


fGdjU G Rho^h-i). Vj = 1,..., n. 

Since dj{fG) = Gdjf + GfdjU a.e., one gets dj{fG) G Rhosh-i)* any j = 1,... ,n 

which proves the resnlt. □ 

Remark 5. As a particnlar case of the above Proposition, if n G Sm n then 

g with 

The Orlicz-Sobolev spaces hFcosh-i(^) ^(cosh-i)*(^)’ defined in Eq. (26) and 
(26) respectively, are instances of Ganssian spaces of random variables and they inherit 


from the corresponding Orlicz spaces a dnality form. In this dnality the adjoint of the 
partial derivative has a special form coming from the form of the weight M see e.g. m 
Ch. V]. We have the following: 

Proposition 28. 

(1) Let f G and g G IT^ogi^_g(M). Then 


(29) 


U,9,g)M = {Xjf -dif,g) 


M 


where Xj is the mutliplication operator by the j-th coordinate Xj. 

(2) If f E ^(cosh-i)*(^)’ ^ (M). More precisely, there exists 

G > 0 such that 


( 30 ) 


IV/II 


jJW /^{cosh -1)* 


^ G 


'^(cosh-D.W) ^ ^(cosh-l)*(M)- 


(3) Iff G lR(Lh-i).(^) bridge WU-iW, then (@ holds 
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Proof. (1) As fM G we have by definition of distribntional derivative, 


if^9j9)M= [ f{x)djg{x)M{x) dx = - [ g{x)dj {f{x)M{x)) dx = 

[ (xjfix) - djf{x)) M{x) dx = {Xjf - djf,g)^. 

(2) Let us observe first that, according to Holder’s inequality 

Em [|-Aj/|] ^ 2 ||Xj||^cosh-i(M) < oo, 

i.e. Xjf G L^(M). Since = (cosh—1)* enjoys the so-called A 2 -condition, 
to prove the stronger result Xjf G (M), it is enough to show that 

Em [^*{^jf)] < OO- First of all, using the tensorization property of the Gaussian 
measure, i.e. the fact that M{x) = Mi(xi)... Mi{xn) for any x = (xi,..., Xn) G 
MP where Mi stands for the one-dimensional standard Gaussian, we claim that 
it is enough to prove the result for n = 1. Indeed, given / G bF(cosh-i)*(^) 
and Xj G M (j = l,...,n), any x = (xi,...,x„) G M” can be identified with 
X = {xj,x) with X G and Xjf{x) = XjFj^Xj) where F^iy) = f{y,x) for any 

X G y eR. We also set M„_i(x) = M{x)/Mi{xj). Then, for a.e. x G 

e VF(Lh-i)*(^i) with 



Mn-l{x)dx 


and 


^.{FM)MMdy 


d>*(/(x)) M{x)dx 



M{x)dx / Mi{y)^^{F'^{y))dy = 


M(x)<h*(9j/(x)) dx 


where F' denotes the distributional derivative of F = F{y). In particular, if there 
exists G > 0 such that 


(31) 

f i>,{yF(y))M,{y)dy [ (<I>.(F(!/)) + >I>.(F'(!/))) M,{y)dy VF e 

«/ M «/ M 


we get the desired result. 


Let us then prove (31) and fix F G bF^cosh-i). (^i)- From <h*(|/) 
together with the evenness of arsinh we obtain 



arsinh u du 


^*{yF{y)) = 

Jo 

Write for simplicity 


"T(i/)l r\Fiy)\ 

|l/| arsinh(|?/| n) dv= ya.Tsmh{yv) dv. 


r\Fiy)\ 

F'iy) '■= / arsinh(|/n) dv 


one has 
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^*{yF{y))Mi{y) dy = / yMi{y)G{y) dy 


= / M,{y)G\y) dy. 


G{y)M[{y) dy 


Now, the derivative of G exists because of the assumption F G W^(cosh-i)*(^i) 
(that is, |F| G ^lcosh-i)S^^^ derivative is given by the derivation of a 

composite function) and it is computed as 


d rFiy)\ ^ 

G'{y) = arsinh(|/ \F{y)\)— \F{y)\ + / dv. 

dy Jo ^/l + y^v^ 

Using Young’s inequality with $ = cosh—1 and = (cosh—1)* we get 


/ \ J d \ \ ^ + y‘^\F{y)\ —1 

G\y) ^ $(^arsinh(|/|F(|/)|)j +<h,(^-|F(|/)| j + V---. 

All the terms in the right-hand side of the above inequality are integrable with 
respect to the measure Mi{y) dy over M. Indeed the first term is bounded as 

$ (arsinh(|/ \F{y)\)) = I + y^ \F{y)f ^ ^2 (1 V \yF{y)\) 

and y i—)■ yF{y) G L^(M, Mi{y)dy). The second term is integrable by assumption. 
The only concern is then the last term. For any r > 0, 


„,,di+FTdht-1 1 

Miiy)- - 5 - dy ^ - 


>r 


\F{y)\MMdy^- / ^.{F{y))M,{y)dy 

f ./TU 


while. 



Mife) 


G + y^\F(y)f 

y2 


1 

-dy ^ 



\F{y)\ 

\y\ 


dy ^ 



\F{y)\ 

\y\ 


dy. 


Now, splitting the integral into the two integrals and one can use the 
one-dimensional Hardy inequality in Orlicz-Sobolev space na to get that there 
exists C > 0 such that 


( 3 ) 


\Fiy)\ 

bl 


dy^G ^,iF'{y))dy. 


This achieves to prove (31). 

Recall that (29) holds for any / G ^“(M”) and any g G (M). Since 

= (cosh—1)* enjoys the A 2 -condition, it is a well-known fact that C“(M"’) is 
dense in hUAosh-i'i (^) norm || ■ Hv^^i ). Therefore, approximating 

any / G hF(cosh-i).(^) a sequence of C“(M"') functions, we deduce the result 
from point 1. 

□ 


29 

















Remark 6 . In the second Item of the above Proposition, notice that a priori Xjf belongs 
to (M) but not to fh(cosh-i)*(^)’ J = 1 ) ■ ■ ■ 5 '^- this to be true, one would 

require that dkf G bP(cosh-i).(^) fc = 1 ,..., n. 


II ^S + C, ll/llvr..,_.,.,„, V/ 6 

6.1. Stein and Laplace operators. Following the language of m Chapter V], Item 3 
of the above Proposition can be reformulated saying that 

{f,djg)M = {^jf,9)M Vj = l,...,n 

where 


- d,f. 

This allows to dehne the Stein operator S on 

6 : fe Dom(5) C (M) ^ 6f = G (M))". 

where the the domain Dom(<5) of S is exactly according to point 2. of the 

above Proposition. Notice that, since enjoys the A 2 -condition, Dom(5) is dense in 
^(cosh-i)* One deduces then easily that <5 is a closed and densely dehned operator 

in /,(cosh-i)* (^M). 

One sees that 


(32) (/, div g)^:=Y^ (/, djgj )m = Y1 m =■ s )m 

j=i j=i 

V/ e i g = (g,),.,.„ e 

where div g = ^j9j divergence of g. This allows to dehne the adjoint operator 

<5* as follows, see [lO] 


S* : Dom(S*) C ^cosh-i 

with 


Dom(<5*) = 

g = fe)i=i.„e (V”'—(M))"|(3c>0)(V/6Dom(«)) |{g,«/)„K c||/||„„»_.,.,„, 

and 

(g,^/)M = e Dom(5*) , Vf G Dom(<5). 


One sees from (32) that 


(‘P„h_,(Af))" C Dom(r) and d’g = V • g Vg 6 ■ 

n 

Efe"V)M <c||/||y. v/etyiU-D.IAn. 

i=i 
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and one sees that there exists C > 0 snch that, for any ?' = and any f e 

W^(cosh-l).(^), it holds 

\{9j,djf)J^C\\f\\y^. 

|(%„/)mI ^CWfWy^ V/ e iy(Lh-i)*W 

Remark 7. Notice that, since $ = cosh—1 does not satishes the A 2 -condition, it is not 
clear whether Dom(5*) is dense in or not. However, from the general the¬ 

ory of adjoint operators and since 5 is a closed densely dehned operator in 
the domain Dom(5*) is dense in endowed with the weak-T*r topology. More¬ 
over, 5* is a closed operator from to /.^osh-i jjgj 2 ] for 

details). 

We also dehne the gradient operator 

V : Dom(V) C (M) ^ 

by 

Dom(V) = and V/ = / G 

One sees that, if / G Dom(V^), i.e. if / GG is snch that V/ G (g , 

then V/ G Dom(<5*) and 


s-vf=j2^y=^f- 

i=i 

corresponds to the Laplace operator. 

From now, set ^colh-ii^) space of W(cosh-i)^,{M)\ 

:= (ir|Lh-i).(M))‘ 

i.e. the set of all continnons linear forms continnons F : fH(cosh-i)* (’’ ■)-! 1 

denotes the dnality pairing: 

(F,n)_p, = Fiu) \/u G lH(Lh-i),(^) ; F G 
For any / G Dom(V), let 

y : s e foco.h-i).(Aa {v/,«j)„ 6 R. 

One easily checks that Lj is continnons and dehnes therefore an element of 

Clearly, the operator / G Dom(V) = H/)osh-i(^) ^ ^ linear. 

Using the identihcation A = 5*V, we denote it by 

A : WU-iiM) 

/ ^ A/ := Lf 

dehned as 


(A/,n)_p, = {Vf,6u)y,, f G WU-iiM),u G lU(Lh-ip(M). 
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Notice that, with this dehnition, A/ is an element of ld^cosh-i("^) whereas, from the 
above observation. A/ G (M). Also the domains of both operators are different. 

Of conrse, if / € Dom(V^) then A/ is actnally an element of coincides 

with A/, namely, in snch a case 


(A/,n)_,_, = (A/,n)^ Wu G lV(Lh-i).(M). 
Remark 8. Given / G hhcosh-i(^) ^ one has 


(A/,m)_ii = {Vf,6u)j^ = '^{djf,6ju) 


M 


J=1 


bnt, since 5jU G fG(cosh-i)*(^)’ (IMf to get 

n 

(A/,w)-i,i = 

i=i 

One readily compntes, for any j = 1,..., n, 6‘jU = (a;| — 1 )m — 2xjdjU + djjU, so that 


{Af,u)_,^, = (/, (|X|2 -i)u-2X-Vu + Au)^ Vn G Go“(M"). 

Remark 9. Since the constant fnnction 1 G fG(cosh-i)*(^) checks easily that 

(A/, = (X, V/)^ = Em [X ■ V/] V/ G 

where we notice that X ■ V/ G L^{M) for any / G Actnally, since X G 


^(cosh-i)* (^)) again to get 


(X, V/>„ = {SjXj, /)„ = 5^ (x? - 1. /> 

i=i i=i 


M 


i.e. 


{A/.l>_u = (|Xp-n,/>„= / f{x)AM{x)dx Vf€WU-AM). 

JR" 

Of course, if / G hlA,gh-i(^) ^^^1 V/ G (hh’cosh-i(^))” actually gets A/ = A/ G 

^cosh-l 


Em [A/] = Em [X ■ V/]. 


For technical purposes, we finally state the following Lemma 
Lemma 29. Given Wi,W 2 G ^ O hFcosh-i(^) 9 — ^m{v) one has 

(33) (An; 2 , = -E^ [Vn;i ■ VW 2 ] + E^ [n;i (X - Vn) ■ VW 2 ] ■ 


Proof. For simplicity, set h = wiR" 
^(cosh-i)*(^) that, by dehnition 


Qj^g knows from Proposition 27 that h 


G 


(AM;2,h)_;^_^ = {Vw2,Sh)j^. 
Moreover, one checks easily that 


5h = Xh-Vh = Xh-- hVv 
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so that 


(Aw;2, = {Vw2, {X - V^;) h)^ - {Vw^, ^ 

which gives the result. □ 

6.2. Exponential family based on Orlicz-Sobolev spaces with Gaussian weights. 

If we restrict the exponential family S (M) to M-centered random variable 
that is in 


Wm = »'e'„i,_i(A/) n Bm = {£/ e iri.„_,(M)[E„ It/] = 0} , 

we obtain the following non parametric exponential family 

S, (M) = ■ M|f/ e WU-iiM) HSm} . 

Because of the continuous embedding ^ the set fh'cosh-i(Af) H 

Sm is open in Wm and the cumulant functional Km ■ -i (Af) H Sm —t M is convex 

and differentiable. 

In a similar way, we can dehne 

•Wm = H'(‘e„.h-i).(V) n -Bm = {/ € |/] = 0} 

SO that for each f G Si (M) we have ^ — 1 G *Wm, see Remark]^ 

Every feature of the exponential manifold carries over to this case. In particular, we 
can dehne the spaces 

Wf = WU-iiM) nBM = {Ue WU-iiM)\Ef [f/] = 0} , / e (M) , 

to be models for the tangent spaces of Si {M). Note that the transport acts on these 
spaces 


l]y.Wf3U^U-Eg[U]eWg, 

SO that we can dehne the tangent bundle to be 

TSi (M) = {{g,V)\g eSi{M),Ve Wf} 
and take as charts the restrictions of the charts dehned on (Af). 

As a hrst example of application, note that the gradient of the BG-entropy 

VHif) = -\ogf-Hif) 

is a vector held on Si (M), which implies the solvability in Si (M) of the gradient how 
equation. Our concern here is to set up a framework for the study of evolution equation 
in Si (M). Following sections are devoted to discuss a special functional and its gradient. 

6.3. Hyvarinen divergence. We begin with the following general properties of Wm = 
{C'e»'c„h-i(A!f)|EM|C/l = 0} 

Proposition 30. Let f,g E Si (M), with f = eM{u),g = ^Miy), u,v E W^^^i^_^{M)r]SM- 
The following hold 

(1) Em [Vu] = Covm (m, a:). 

(2) Eg [X - Vv] = 0. 

(3) Eg [Xu] = CoVg {u,X — Xv). 

(4) Eg [Xu — Vn] = CoVg {u — v,X — Xv). 
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Proof. In all the sequel, we set F = e“ ^m{u) = _L and G = e’^ KmP) _ Recall from 
Proposition 27 that F,G ^ ^(cosh-i),(^)- 


(1) One has Em [Vm] = {l,dju)j^ where 1 G W^(cosh-i)* (^) constant function. 

Then, from (29), Em [Vm] = {6l,u)j^ = Covm {u,X) since = X. 

(2) As above, one has 

Eg [Vn] = J XveM{v)dx = J XvGMdx 

and, from Proposition VnG = VG so that 

Eg [Vn] = Em [VG] . 

Recall that G G hP/cosh-i) (^) VG = XG — 5G so that 


Eg [Vn] = (XG,l)^-(5G,l) 


M 


where 1 G IPcosh-i(^) is the constant function equal to 1. Applying again (29) 
we get (5G, 1)^ = (G, Vl)^ = 0 and 


Eg [Vn] = (XG,l)^ = Eg [X] 


(3) Observe hrst that Eg [Vn] = (G, Xu)^. Using again (29), since u G lUcQsh-i(^) 


and G G lU(^cosh-i)*(^) have 


Eg [Vn] = (5G, u)^ = (XG - VG, u)^ . 

Since GM = g and VG = VnG (see Proposition [2^ we get (XG — XG.iU)^ = 
Eg [(X - Vn) u] which gives the result. 

(4) Arguing as above one sees that Eg [Vn] = Eg [(X — Xv)v] and therefore the 
conclusion follows from the previous item. 

□ 


Remark 10. 


(1) The Eq.s in Prop. 30 could be written without reference to the score (chart) 
mapping sm '■ f ^ uhy writing Xu = V log (^) = V log / + X to get 

(a) E„ [Vlog(i)] =Cov„(log(i),X). 


(b) E, 

(c) Eg 

(d) Eg 


V log S'] = 0 

=-CoVg (log(^) , Vlogs). 

Vlog(4 = -Covg (log({) ,Vlog(^) 


However, we feel that explicit reference to the chart clarihes the geometric picture. 

(2) The mapping / i—>■ Vlog/ is a vector held in T£i (M) with how given by the 
translations: 


^ log/t = Vlog/i, f{x,t) = f{x + lt). 

(3) The KL-divergence (/,S') '“t D (/US') has expression {u,v) h-)■ dKM{v)[u — n] — 
Km{u) + Km{v) in the chart centered at M, with partial derivative with respect 
to V in the direction w given by CoVg (u — v,w). If the direction istc = X — Vn = 
—V logs', 'W6 have that Eg [Xv — Vu] is the derivative of the KL-divergence along 
the vector held of translations. 


We introduce here the Hyvdrinen divergence between two elements of Si (M): 
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Definition 31 (Hyvarinen divergence). For each f,g & £i (M) the Hyvarinen divergence 
is the quantity 


DH {g\f)=Eg [|Vlog/-Vlogr?|']. 

The expression in the chart centered at M is 

DHM(i;||M) :=BR{eM{v)\eM{u)) = EM [\Vu - Vvf , 

where f = eM^u), g = eM{v). 

Remark 11. (1) The mapping £i{M) 3 f Vlog/ = /“^V/ is, in statistical 

terms, an estimating function or a pivot, because Ej [Vlog/] = 0, f E £i (M). 
This means, it is a random variable whose value is zero in the mean if / is cor¬ 
rect. If g is correct, then the expected value is E^ [Vlog/] = Eg 

— CoVg ^log , Vlog(^)j. The second moment of Vlog was used by 

Hyvarinen as a measure of deviation from / to g, [21] 

(2) Hyvarinen work has been used to discuss proper scoring rules in [30] . 

(3) The same notion is known in Physics under the name of relative Fisher information 
e.g., see [ 10 ] . 

In the following we denote the gradient of a function defined on the exponential man¬ 
ifold £i (M), which is a random variable, by d. 

Proposition 32. 

(1) The Hyvarinen divergence is finite and infinitely differentiable everywhere in both 
variables. 

(2) a(/ ^ DH (^71/)) = -2V log ^7 ■ V log f - 2 A log ^ 

(3) d{g ^ DH {f\g)) = 2V log 77 ■ V log f + 2A log ^ + DH if\g). 

Proof. 

(1) For each w E V the gradient Vtc is in {L^{g))'^ = for all g E £i (M), 

hence it is ( 7 -square integrable for all g E £i (M). Moreover, the squared norm 
function Sm x {Bm)^ ^ {v,w) 1 —>• Ee^p^) [|it>|^] is cx)-differentiable because it is 
the moment functional, 

n 

Eg [|it»|^] = ^ [d^KM{v)[wj,Wj] + {dKM{v)[wj]f) . 
j=i 

We can compose this function with the linear function 

VmFSm xVm 3 (v, u) (n, w) = {v, V(m - v)) E Sm x Blj. 

(2) Let g = eM{v),f = euiu), u,v E Sm H H be given. For any w eV, we compute 
first the directional derivative: 



d{ue^ DHm(w||m)) M = 2Em [Vw ■ (Vm - Vn)e^-^"(’')] = 2Eg [Vw ■ (Vu - Vn)] 
where we notice that all the terms are well defined whenever u,v E SmFV, w E V. 


Using now (33) with W 2 = u — v and wi = w we get that 
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d{u^ DHM(t^llM)) M = 2Eg [w {X - Vv) ■ V{u -v)]-2 {A{u - ^ 

= 2Em (X - Vv) ■ V{u - v)] - 2 {A{u - v),we^-^^^^^)^. 

Since this is true for any w E V we get 


d{u eE- DHM(t'||M)) = 2 (X — Vn) ■ X{u — v) — 2A{u — v) 

where of course, A{u — v) is meant in hh’coL-i(^) (notice that w G hhcosh-i(^) 
^(cosh-i)*("^)- formula for the partial gradient in absolute variables follows 
from 


(X - Vn) ■ V(m - v) 


—V loggf ■ V log 


/ 

9 


for, with a slight abuse of notations, we identify A log ^ to A{u — v). 

(3) As above, let g = eM^v), f = gm{u), u,v G Sm H 1/ be given. For any w E V, we 
compute hrst the directional derivative. One gets now 


d{ve^ DHM(M||n)) M = -2Em [Vw ■ {Vu - 

+ Em [{w - dKM{v)[w]) |Vm- 

One recognizes in the hrst term —d {u h-)■ DHa 4 '(v||m)) [tc] while the second term 
is given by 


Em [(w - dKM{v)[w])\Vu - Vvf = E^ [(w - E^ [w ])\Vu - Vn|^] = 

Covg (tc, |Vm — Vn|^) . 


As in the previous item, this gives the result. 

□ 


As well-documented, the Hyvar’inen divergence is a powerful tool for the study of 
general diffusion processes. We have just shown that the Information Geometry formalism 
and the exponential manifold approach are robust enough to allow for a generalization 
in Orlicz-Sobolev spaces. We believe then that, as Boltzmann equation can be studied 
through the exponential manifold formalism in /.^osh-i general diffusion processes 
can be investigated in with the formalism discussed in the present section. 

This is a plan for future work. 


7. Conclusions and Discussion 

We have shown that well known geometric feature of problems in Statistical Physics can 
be turned into precise formal results via a careful consideration of the relevant functional 
analysis. 

In particular the notion of flow in a Banach manifold modeled on Orlicz spaces can 
be used to clarify arguments based on the evolution of the classical Boltzmann-Gibbs 
entropy in the vector held associated to the Boltzmann equation. 

In the last section we have shown how to construct a similar theory in the case the 
generalized entropy under consideration is the so-called Fisher functional or Hyvari'nen 
divergence. Such a generalised entropy is particularly well-suited for the study of general 
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diffusion problems and the results presented in Section 6 can be seen as the hrst outcome 
of an ongoing joint research program. 
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